Hadoop Scheduling Base On Data Locality

نویسندگان

  • Bo Jiang
  • Jiaying Wu
  • Xiuyu Shi
  • Ruhuan Huang
چکیده

In hadoop, the job scheduling is an independent module, users can design their own job scheduler based on their actual application requirements, thereby meet their specific business needs. Currently, hadoop has three schedulers: FIFO, computing capacity scheduling and fair scheduling policy, all of them are take task allocation strategy that considerate data locality simply. They neither support data locality well nor fully apply to all cases of jobs scheduling. In this paper, we took the concept of resources-prefetch into consideration, and proposed a job scheduling algorithm based on data locality. By estimate the remaining time to complete a task, compared with the time to transfer a resources block, to preselect candidate nodes for task allocation. Then we preselect a non-local map tasks from the unfinished job queue as resources-prefetch tasks. Getting information of resources blocks of preselected map task, select a nearest resources blocks from the candidate node and transferred to local through network. Thus we would ensure data locality good enough. Eventually, we design a experiment and proved resources-prefetch method can guarantee good job data locality and reduce the time to complete the job to a certain extent.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing

Using different scheduling algorithms can affect the performance of mobile cloud computing using Hadoop MapReduce framework. In Hadoop MapReduce framework, the default scheduling algorithm is First-In-First-Out (FIFO). However, the FIFO scheduler simply schedules task according to its arrival time and does not consider any other factors that may have great impact on system performance. As a res...

متن کامل

Locality Aware Fair Scheduling for Hammr

Hammr is a distributed execution engine for data parallel applications modeled after Dryad. In this report, we present a locality aware fair scheduler for Hammr. We have developed functionality to support hierarchical scheduling, preemption and weighed users and a minimum flow based algorithm to maximize task preference. For evaluation, we’ve run Hammr on Hadoop Distributed File System on Amazo...

متن کامل

Maximizing Data Locality in Hadoop Clusters via Controlled Reduce Task Scheduling

The overall goal of this project is to gain a hands-on experience with working on a large open-ended research-oriented project using the Hadoop framework. Hadoop is an open source implementation of MapReduce and Google File System, and is currently enjoying wide popularity. Students will modify the task scheduler of Hadoop, conduct several experimental studies, and analyze performance and netwo...

متن کامل

Performance Impact of Data Locality in MapReduce on Hadoop

As the foundation for MapReduce processing, Hadoop is one of the fundamental technologies in big data analytics. Hadoop breaks up large data into data blocks, replicates them, and stores them in a distributed storage system. Data blocks can be placed in a machine where the data will be processed (data local), in a machine in the same rack (rack-local), or in a machine in a different rack (off-r...

متن کامل

Scheduling algorithm based on prefetching in MapReduce clusters

Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1506.00425  شماره 

صفحات  -

تاریخ انتشار 2015